{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Anomalus Vertices Detection in Academia.edu Graph\n",
    "In this notebook I am going to demonstrate how I use the method described in https://arxiv.org/abs/1610.07525 to detect anomlies in Academia.edu social network.\n",
    "Since we don't have a ground truth labels we are going to use the academia.edu dataset with simulated anomalus vertices.\n",
    "\n",
    "To install the packge please read the [installion intruction](https://github.com/Kagandi/anomalous-vertices-detection)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from anomalous_vertices_detection.graph_learning_controller import GraphLearningController\n",
    "from anomalous_vertices_detection.learners.sklearner import SkLearner\n",
    "from anomalous_vertices_detection.datasets.academia import load_data\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The basic installation install the packge with networkx as its graph analysis package, it is also possible to install and use iGraph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we will define what considered as positive and negative labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "labels = {\"neg\": \"Real\", \"pos\": \"Fake\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we will load the graph of academia.edu.\n",
    "load_data will return a graph object(academia_graph) and a config object(academia_config).\n",
    "Since the acedemia.edu dataset doesn't has real world labels load_data will simulate 10% fake vertices. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading graph...\n",
      "Data loaded.\n",
      "Generating 77017 vertices.\n",
      "77017 fake users generated.\n",
      "847191\n"
     ]
    }
   ],
   "source": [
    "academia_graph, academia_config = load_data(labels_map=labels, simulate_fake_vertices=True)\n",
    "print(len(academia_graph.vertices))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are going to configure a learning controller that will use Scikit as its ml package. (It is also possible to use GraphLab/Dato/Turi however, since Apple bought the company I am going to reffer only SkLearner in most of the notebooks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "glc = GraphLearningController(SkLearner(labels=labels), academia_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The reuslt will be written in result_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_folder = \"../output/\"\n",
    "result_path = os.path.join(output_folder, academia_config.name + \"_res.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some of the extracted feature can be usefull for understanding the result but they must not be used in the ml proccess"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "if academia_graph.is_directed:\n",
    "    meta_data_cols = [\"dst\", \"src\", \"out_degree_v\", \"in_degree_v\", \"out_degree_u\", \"in_degree_u\"]\n",
    "else:\n",
    "    meta_data_cols = [\"dst\", \"src\", \"number_of_friends_u\", \"number_of_friends_v\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally we are goigng to execute the classification algorithm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'glc' is not defined",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-1-a662f2d241a1>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m glc.classify_by_links(academia_graph, result_path, test_size={\"neg\": 1000, \"pos\": 100},\n\u001b[0m\u001b[0;32m      2\u001b[0m                       train_size={\"neg\": 20000, \"pos\": 20000}, meta_data_cols=meta_data_cols)\n",
      "\u001b[1;31mNameError\u001b[0m: name 'glc' is not defined"
     ],
     "output_type": "error"
    }
   ],
   "source": [
    "glc.classify_by_links(academia_graph, result_path, test_size={\"neg\": 1000, \"pos\": 100},\n",
    "                      train_size={\"neg\": 20000, \"pos\": 20000}, meta_data_cols=meta_data_cols)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
